Cheshire
Comprehending Spatio-temporal Data via Cinematic Storytelling using Large Language Models
Shang, Panos Kalnis. Shuo, Jensen, Christian S.
Spatio-temporal data captures complex dynamics across both space and time, yet traditional visualizations are complex, require domain expertise and often fail to resonate with broader audiences. Here, we propose MapMuse, a storytelling-based framework for interpreting spatio-temporal datasets, transforming them into compelling, narrative-driven experiences. We utilize large language models and employ retrieval augmented generation (RAG) and agent-based techniques to generate comprehensive stories. Drawing on principles common in cinematic storytelling, we emphasize clarity, emotional connection, and audience-centric design. As a case study, we analyze a dataset of taxi trajectories. Two perspectives are presented: a captivating story based on a heat map that visualizes millions of taxi trip endpoints to uncover urban mobility patterns; and a detailed narrative following a single long taxi journey, enriched with city landmarks and temporal shifts. By portraying locations as characters and movement as plot, we argue that data storytelling drives insight, engagement, and action from spatio-temporal information. The case study illustrates how MapMuse can bridge the gap between data complexity and human understanding. The aim of this short paper is to provide a glimpse to the potential of the cinematic storytelling technique as an effective communication tool for spatio-temporal data, as well as to describe open problems and opportunities for future research.
- North America > United States > New York > New York County > New York City (0.05)
- Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.05)
- Europe > Portugal > Porto > Porto (0.05)
- (8 more...)
Paper2Web: Let's Make Your Paper Alive!
Chen, Yuhang, Lv, Tianpeng, Zhang, Siyi, Yin, Yixiang, Wan, Yao, Yu, Philip S., Chen, Dongping
Academic project websites can more effectively disseminate research when they clearly present core content and enable intuitive navigation and interaction. However, current approaches such as direct Large Language Model (LLM) generation, templates, or direct HTML conversion struggle to produce layout-aware, interactive sites, and a comprehensive evaluation suite for this task has been lacking. In this paper, we introduce Paper2Web, a benchmark dataset and multi-dimensional evaluation framework for assessing academic webpage generation. It incorporates rule-based metrics like Connectivity, Completeness and human-verified LLM-as-a-Judge (covering interactivity, aesthetics, and informativeness), and PaperQuiz, which measures paper-level knowledge retention. We further present PWAgent, an autonomous pipeline that converts scientific papers into interactive and multimedia-rich academic homepages. The agent iteratively refines both content and layout through MCP tools that enhance emphasis, balance, and presentation quality. Our experiments show that PWAgent consistently outperforms end-to-end baselines like template-based webpages and arXiv/alphaXiv versions by a large margin while maintaining low cost, achieving the Pareto-front in academic webpage generation.
- North America > United States > Maryland (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Connecticut > New Haven County > Cheshire (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Evaluating Compliance with Visualization Guidelines in Diagrams for Scientific Publications Using Large Vision Language Models
Rückert, Johannes, Bloch, Louise, Friedrich, Christoph M.
Diagrams are widely used to visualize data in publications. The research field of data visualization deals with defining principles and guidelines for the creation and use of these diagrams, which are often not known or adhered to by researchers, leading to misinformation caused by providing inaccurate or incomplete information. In this work, large Vision Language Models (VLMs) are used to analyze diagrams in order to identify potential problems in regards to selected data visualization principles and guidelines. To determine the suitability of VLMs for these tasks, five open source VLMs and five prompting strategies are compared using a set of questions derived from selected data visualization guidelines. The results show that the employed VLMs work well to accurately analyze diagram types (F1-score 82.49 %), 3D effects (F1-score 98.55 %), axes labels (F1-score 76.74 %), lines (RMSE 1.16), colors (RMSE 1.60) and legends (F1-score 96.64 %, RMSE 0.70), while they cannot reliably provide feedback about the image quality (F1-score 0.74 %) and tick marks/labels (F1-score 46.13 %). Among the employed VLMs, Qwen2.5VL performs best, and the summarizing prompting strategy performs best for most of the experimental questions. It is shown that VLMs can be used to automatically identify a number of potential issues in diagrams, such as missing axes labels, missing legends, and unnecessary 3D effects. The approach laid out in this work can be extended for further aspects of data visualization.
- South America > Colombia > Meta Department > Villavicencio (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language Models
Coscia, Adam, Guo, Shunan, Koh, Eunyee, Endert, Alex
As multi-turn dialogues with large language models (LLMs) grow longer and more complex, how can users better evaluate and review progress on their conversational goals? We present OnGoal, an LLM chat interface that helps users better manage goal progress. OnGoal provides real-time feedback on goal alignment through LLM-assisted evaluation, explanations for evaluation results with examples, and overviews of goal progression over time, enabling users to navigate complex dialogues more effectively. Through a study with 20 participants on a writing task, we evaluate OnGoal against a baseline chat interface without goal tracking. Using OnGoal, participants spent less time and effort to achieve their goals while exploring new prompting strategies to overcome miscommunication, suggesting tracking and visualizing goals can enhance engagement and resilience in LLM dialogues. Our findings inspired design implications for future LLM chat interfaces that improve goal communication, reduce cognitive load, enhance interactivity, and enable feedback to improve LLM performance.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > South Korea > Busan > Busan (0.05)
- (12 more...)
MapStory: Prototyping Editable Map Animations with LLM Agents
Gunturu, Aditya, Pearman, Ben, Ihara, Keiichi, Faraji, Morteza, Wang, Bryan, Kazi, Rubaiat Habib, Suzuki, Ryo
We introduce MapStory, an LLM-powered animation prototyping tool that generates editable map animation sequences directly from natural language text by leveraging a dual-agent LLM architecture. Given a user written script, MapStory automatically produces a scene breakdown, which decomposes the text into key map animation primitives such as camera movements, visual highlights, and animated elements. Our system includes a researcher agent that accurately queries geospatial information by leveraging an LLM with web search, enabling automatic extraction of relevant regions, paths, and coordinates while allowing users to edit and query for changes or additional information to refine the results. Additionally, users can fine-tune parameters of these primitive blocks through an interactive timeline editor. We detail the system's design and architecture, informed by formative interviews with professional animators and by an analysis of 200 existing map animation videos. Our evaluation, which includes expert interviews (N=5) and a usability study (N=12), demonstrates that MapStory enables users to create map animations with ease, facilitates faster iteration, encourages creative exploration, and lowers barriers to creating map-centric stories.
- North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.86)
- North America > United States > Colorado > Boulder County > Boulder (0.40)
- North America > The Bahamas (0.14)
- (16 more...)
- Questionnaire & Opinion Survey (1.00)
- Personal > Interview (0.88)
- Research Report > New Finding (0.67)
- Media > Film (0.48)
- Media > Television (0.34)
- Media > Photography (0.34)
CHART-6: Human-Centered Evaluation of Data Visualization Understanding in Vision-Language Models
Verma, Arnav, Mukherjee, Kushin, Potts, Christopher, Kreiss, Elisa, Fan, Judith E.
Data visualizations are powerful tools for communicating patterns in quantitative data. Yet understanding any data visualization is no small feat -- succeeding requires jointly making sense of visual, numerical, and linguistic inputs arranged in a conventionalized format one has previously learned to parse. Recently developed vision-language models are, in principle, promising candidates for developing computational models of these cognitive operations. However, it is currently unclear to what degree these models emulate human behavior on tasks that involve reasoning about data visualizations. This gap reflects limitations in prior work that has evaluated data visualization understanding in artificial systems using measures that differ from those typically used to assess these abilities in humans. Here we evaluated eight vision-language models on six data visualization literacy assessments designed for humans and compared model responses to those of human participants. We found that these models performed worse than human participants on average, and this performance gap persisted even when using relatively lenient criteria to assess model performance. Moreover, while relative performance across items was somewhat correlated between models and humans, all models produced patterns of errors that were reliably distinct from those produced by human participants. Taken together, these findings suggest significant opportunities for further development of artificial systems that might serve as useful models of how humans reason about data visualizations. All code and data needed to reproduce these results are available at: https://osf.io/e25mu/?view_only=399daff5a14d4b16b09473cf19043f18.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering
Chen, Zixin, Song, Sicheng, Shum, Kashun, Lin, Yanna, Sheng, Rui, Qu, Huamin
Misleading chart visualizations, which intentionally manipulate data representations to support specific claims, can distort perceptions and lead to incorrect conclusions. Despite decades of research, misleading visualizations remain a widespread and pressing issue. Recent advances in multimodal large language models (MLLMs) have demonstrated strong chart comprehension capabilities, yet no existing work has systematically evaluated their ability to detect and interpret misleading charts. This paper introduces the Misleading Chart Question Answering (Misleading ChartQA) Benchmark, a large-scale multimodal dataset designed to assess MLLMs in identifying and reasoning about misleading charts. It contains over 3,000 curated examples, covering 21 types of misleaders and 10 chart types. Each example includes standardized chart code, CSV data, and multiple-choice questions with labeled explanations, validated through multi-round MLLM checks and exhausted expert human review. We benchmark 16 state-of-the-art MLLMs on our dataset, revealing their limitations in identifying visually deceptive practices. We also propose a novel pipeline that detects and localizes misleaders, enhancing MLLMs' accuracy in misleading chart interpretation. Our work establishes a foundation for advancing MLLM-driven misleading chart comprehension. We publicly release the sample dataset to support further research in this critical area.
- Africa (0.04)
- North America > United States > Connecticut > New Haven County > Cheshire (0.04)
- North America > United States > California > San Bernardino County > Redlands (0.04)
- (3 more...)
Protecting multimodal large language models against misleading visualizations
Tonglet, Jonathan, Tuytelaars, Tinne, Moens, Marie-Francine, Gurevych, Iryna
Visualizations play a pivotal role in daily communication in an increasingly data-driven world. Research on multimodal large language models (MLLMs) for automated chart understanding has accelerated massively, with steady improvements on standard benchmarks. However, for MLLMs to be reliable, they must be robust to misleading visualizations, charts that distort the underlying data, leading readers to draw inaccurate conclusions that may support disinformation. Here, we uncover an important vulnerability: MLLM question-answering accuracy on misleading visualizations drops on average to the level of a random baseline. To address this, we introduce the first inference-time methods to improve performance on misleading visualizations, without compromising accuracy on non-misleading ones. The most effective method extracts the underlying data table and uses a text-only LLM to answer the question based on the table. Our findings expose a critical blind spot in current research and establish benchmark results to guide future efforts in reliable MLLMs. Keywords: large language models, chart understanding, visualization In an increasingly data-driven world, visualizations are widely used by scientists, journalists, governments, or companies to efficiently communicate data insights to a broad audience [1]. The correct answer is colored in green, while the wrong answer supported by the misleader is colored in purple. In many cases, visualizations support a message more convincingly than if the underlying data table was shown directly to readers [3].
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Connecticut > New Haven County > Cheshire (0.04)
- (7 more...)
- Research Report > Experimental Study (0.94)
- Research Report > New Finding (0.67)
- Government (0.93)
- Media > News (0.54)
Preliminary Report: Enhancing Role Differentiation in Conversational HCI Through Chromostereopsis
We propose leveraging chromostereopsis, Building upon traditional methods that a perceptual phenomenon inducing depth utilize color-coding and textual formatting, perception through color contrast, as a our approach employs a dark terminal novel approach to visually differentiating background to enhance the optical illusion conversational roles in text-based AI interfaces.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Connecticut > New Haven County > Cheshire (0.04)
Grounded Language Design for Lightweight Diagramming for Formal Methods
Prasad, Siddhartha, Greenman, Ben, Nelson, Tim, Krishnamurthi, Shriram
Model finding, as embodied by SAT solvers and similar tools, is used widely, both in embedding settings and as a tool in its own right. For instance, tools like Alloy target SAT to enable users to incrementally define, explore, verify, and diagnose sophisticated specifications for a large number of complex systems. These tools critically include a visualizer that lets users graphically explore these generated models. As we show, however, default visualizers, which know nothing about the domain, are unhelpful and even actively violate presentational and cognitive principles. At the other extreme, full-blown visualizations require significant effort as well as knowledge a specifier might not possess; they can also exhibit bad failure modes (including silent failure). Instead, we need a language to capture essential domain information for lightweight diagramming. We ground our language design in both the cognitive science literature on diagrams and on a large number of example custom visualizations. This identifies the key elements of lightweight diagrams. We distill these into a small set of orthogonal primitives. We extend an Alloy-like tool to support these primitives. We evaluate the effectiveness of the produced diagrams, finding them good for reasoning. We then compare this against many other drawing languages and tools to show that this work defines a new niche that is lightweight, effective, and driven by sound principles.
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- (6 more...)